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(57) Abstract 

A method is disclosed of recovering from two mixtures of DNA those sequences which are present in both mixtures (coinc- 
ident sequences). DNA in a first mixture is produced in single stranded form to which are annealed capture oligonucleotides with 
known sequences. The first mixture is combined with single stranded DNA in a second mixture. Sequences in the second mixture 
anneal to homologous sequences in the first mixture. "Captured" homologous sequences are ligated to the capture oligonucleo- 
tides and heteroduplex coincident DNA sequences are then recovered. 
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Title : Analysis of DNA 



Field of invention 

This invention concerns analysis of DNA, in particular a 
method for recovering from two mixtures of DNA only those 
sequences present in both mixtures, i.e. coincident or 
shared DNA sequences in the two mixtures. 

Background to the invention 

When analysing mixtures of DNA, particularly complex 
mixtures, it would be useful to be able to identify and 
recover those sequences present in two mixtures, and in 
recent years attempts have been made to achieve this end. 
Success so far has been very limited, although a few 
techniques have been developed which are applicable in 
limited, restricted situations. For example, a technique 
has been developed which enables isolation of human inter- 
alu fragments coincident between overlapping human-rodent 
somatic cell hybrids. 

The present invention aims to provide a more versatile, 
generally applicable technique for isolating coincident 
sequences of DNA from two mixtures. 

Summary of the invention 



According to the present invention there is provided a 
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method of recovering from two mixtures of DNA those DNA 
sequences present in both mixtures, comprising treating 
DNA in a first mixture in a manner comprising the use of 
restriction enzyme (s) to produce single stranded DNA 
fragments with added defined flanking sequences flanking 
the single stranded DNA fragments? annealing capture 
oligonucleotides to the added defined flanking sequences 
of the resulting single stranded DNA; treating DNA in a 
second mixture using the same restriction enzyme (s) to 
produce single stranded DNA fragments; combining the 
products obtained from the first and second mixtures and 
allowing the single stranded DNA fragments to anneal ; 
joining annealed single stranded DNA from the second 
mixture present as a heteroduplex to the capture 
oligonucleotides of DNA from the first mixture; and 
recovering from the resulting mixture sequences captured 
in the form of heteroduplex coincident DNA including the 
capture oligonucleotides . 

Only DNA sequences present in both the first and second 
mixtures will form heteroduplex coincident DNA including 
the capture oligonucleotides , with a sequence from the 
first mixture carrying the capture oligonucleotides 
annealing with the coincident sequence from the second 
mixture. Other DNA fragments will either not anneal or 
will anneal with other single stranded DNA not including 
the capture oligonucleotides. The capture 
oligonucleotides can then be used to recover the 
heteroduplex DNA of interest, for example by using the 
polymerase chain reaction (PCR) against the capture 
oligonucleotides . 

The method of the invention, like known methodologies, 
utilises the formation of heteroduplex species to 
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distinguish between overlapping and non-overlapping 
components in two mixtures. However, the known methods 
utilise two double stranded mixtures of DNA in the 
formation of this heteroduplex and then achieve its 
isolation by some means dependent upon pretreatment of the 
ends of the DNA molecules. In contrast, in the method of 
the present invention the DNA from the first mixture is 
initially converted into a single stranded sequence with 
defined double stranded flanking sequences (by employing 
capture oligonucleotides).. This structure can then act as 
a sequence specific trap for any related fragments in the 
unmodified second mixture. Once trapped a joining step, 
e.g. ligation, is then employed to join the ends of the 
trapped molecule to the capture oligonculeotides , i.e. 
specifically to tag the coincident sequences from the 
second source. By having the modified first source DNA in 
excess it is possible to drive heteroduplex formation to 
completion much more readily than is possible in 
alternative schemes. 

Single stranded DNA fragments obtained from the first 
mixture by treatment with restriction enzymes are 
conveniently cloned into M13 (or other suitable cloning 
vehicles) to produce single stranded DNA copies with added 
defined flanking sequences (from Ml 3 or the other suitable 
cloning vehicle) to which the capture oligonucleotides are 
annealed. Other techniques including PCR could also be 
used for this purpose. The use of PCR has the advantage 
that it allows for greater flexibility in the choice of 
the flanking sequences by altering the sequence of the PCR 
primers . 

The capture oligonucleotides may be of any suitable length 
and are typically about 30-40 base pairs long, with for 
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example about half of the sequence being designed to bind 
Ml 3 and the other half being random. 

The mixtures are conveniently treated with two restriction 
enzymes , e»g. EcoRI and PstI , but other combinations are 
clearly, possible . 

Depending on techniques used, it may be desirable to 
functionally isolate any unhybridised capture 
oligonucleotides before combining the products obtained 
from the first and second mixtures, and this is 
conveniently achieved by adding a further oligonucleotide 
with sequences complementary to the capture 
oligonucleotides . In particular , such treatment is 
desirable when PCR is used to recover the coincident DNA 
of interest. 

As mentioned above, the sequences captured in the form of 
heteroduplex coincident DNA including the capture 
oligonucleotides may be recovered by PCR. Before using 
PCR it is necessary to separate these target sequences 
from the remainder of the first mixture DNA. This may be 
achieved by a size purification step, e.g. gel 
electrophoresis. Other techniques may alternatively be 
used , such as specific modification or degradation of the 
first mixture sequences achieved by virtue of their single 
stranded nature . 

It may be possible to control the specificity of the 
method of regulating the degree of homology or identity 
required for recovery of coincident sequences. The method 
selects for DNA fragments which are perfectly matched at 
both ends but a degree of internal mismatch can be 
tolerated . However , it may be possible to control the 
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stringency of the method, and by employing techniques such 
as single stranded modification and/or degradation of 
annealed heteroduplexes it may be possible for sequences 
which share up to 100% sequence identity to be recovered 
by the method of the invention. 

The method of the invention is very versatile and 
generally applicable, and is useful in the analysis of 
complex mixtures of DNA. For example, the method may be 
used for isolating coincident sequences between complex 
sources such as total mammalian genome DNA, type A e.g. 
Human, and inter-mammalian somatic cell hybrid DNA, sub-AB 
e.g. a human chromosome or fragment thereof in a complete 
rodent genome, in the absence of a background of sequences 
derived from the rodent partner, or from human chromosome 
regions not represented in the hybrid cell. The method 
can also be used for isolating highly conserved DNA 
sequences between distinct species, and for isolating 
invarient DNA sequences between unrelated individuals from 
the same species, e.g. regions of linkage disequilibrium 
spanning disease genes in human populations. The method 
may also be used for integrating positional information 
with expression profile, i.e. for isolating candidate 
genes and exons based on combining available information 
on map location and tissue restricted expression. The 
method may also be applicable to microdissection libraries 
and cDNA libraries. It will be apparent that many other 
applications are also possible. 

The method of the invention is thus of general interest 
and relevance to all genome mapping, evolution, genetic 
disease and expression studies. 

The invention will be further described, by way of 
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illustration, in the following Example and by reference to 
the accompanying figures, in which: 

Figure 1 is a schematic representation of one embodiment 
of the method of the invention applied to two DNA mixtures 
A and B; and 

Figure 2 is a Southern blot analysis of probes 3a and 4a. 
Example 

DNA sequences common to two mixtures of DNA, mixture A and 
mixture B, were recovered using the coincident sequence 
cloning (CSC) method illustrated schematically in Figure 
1.. Briefly, DNA fragments from mixture A were converted 
into short, orientated and single stranded molecules with 
added defined flanking sequences (from M13) at each end by 
first digesting with two restriction enzymes and then 
cloning into M13. A pair of synthetic- capture 
oligonucleotides ('capture oligos') was then annealed to 
this modified form of mixture A (library A). Mixture B 
was digested with the same restriction enzyme pair that 
was previously used to process mixture A, and then alkali 
denatured. Library A and mixture B were combined and 
allowed to anneal in a reaction driven to completion by 
the components of library A. Following a ligation step, 
mixture B sequences were purified from those of library A 
by preparative alkali agarose gel electrophoresis. 
Coincident species were then selectively recovered from 
this material by employing the polymerase chain reaction 
with primers derived from the capture oligonucleotide 
sequences. PCR products were finally cloned and 
analysed . 
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In this example, DNA mixtures of high complexity were 
used, with mixture A comprising 1000 fragments of human 
DNA, and mixture B comprising the total DNA of a somatic 
cell hybrid called 1W1 , which is a human-mouse hybrid with 
chromosomes 11 and Xpter as the sole human component (see 
reference 1 ) . 

To produce DNA mixture A total human DNA was digested to 
completion with the enzyme pair EcoR I /PstI and fragments 
of size 0.1-0.5kb were isolated by preparative agarose gel 
electrophoresis and cloned into EcoR I /PstI digested 
M!3mp18. 1,000 plaques were picked at random and grown in 
150uL cultures. These were pooled for the preparation of 
single stranded DNA. 5' and 3 f capture oligonucleotides 
(see below) were added to 1ug of this DNA at a molar ratio 
of 1:1 in 40uL 10mM Tris/HCl, 1mM MgCl 2 pH7.5 and the 
mixture heated to 65°C and allowed to cool to 37 °C over 
approximately 30 minutes. To bind any unhybridised 
capture oligonucleotides, oligonucleotide 485 (see below) 
was then added, in 1uL H 2 0, at a molar ratio to each 
capture oligo of 10:1. This mixture was left at 37 ft C for 
15 minutes and then placed on ice. 

15ug of the DNA mixture B was digested with the enzyme 
pair EcoR I /Pst I and then phenol, chloroform and ether 
extracted and ethanol precipitated. Following 
resuspension in 50uL H 2 0, the sample was denatured by 
adding 50uL 0.34M NaOH and placing at 37°C for 30 minutes. 
100uL of a prechilled 1:1 mixture of 0.34M HC1 and 0.1M 
Tris/HCl pH7.5 was added to neutralise the solution and 
the sample placed on ice. The modified DNA from the first 
mixture was then added and the total sample ethanol 
precipitated. After resuspension in 18uL H 2 0 both 3uL of 
4M NaCl, 50mM EDTA, 0.1M Tris/HCl pH7.8 and 9uL formamide 
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were added and the mixture allowed to anneal by submerging 
overnight in a 45°C waterbath. The sample was 
precipitated and resuspended in 1 OuL 0.5M EDTA , 5mM 
Tris/HCl pH7-5. A ligation was then performed at 16°C for 
3 hours in 20uL 50mM Tris/HCl, 1 0mM MgCl 2 , 1 0mM 
dithiothreitol, ImM spermidine and 1mM ATP pH7.4 using 0.5 
units of T4 DNA ligase . 

The ligated DNA was passed through a 1 .3% preparative 
alkaline agarose electrophoretic gel (30mM NaOH, 2mM EDTA 
buffer) from which single stranded DNA fragments in the 
0.1-0.5kb size range were recovered as set of five 
fractions called F1 - F5. The fractions were recovered as 
gel slices to be diluted twofold in H 2 0 and melted at 
65°C. 

The five fractions were each amplified by PCR using 
primers derived from the capture oligo sequences in order 
to recover coincident DNA. 30 cycles of PCR were 
performed upon 1uL aliquots of the purified DNA using 
oligos 596/789 (see below). luL of these reactions was 
then further amplified by 22 cycles of PCR using oligos 
790/996 (see below). 

All PCR reactions were carried out in 50uL PCR buffer 
(10mM Tris/HCl, 50 mM KC1 , 1.5mM MgCl 2 , 0 . 2mM each 
dATP/dTTP/dCTP/dGTP, 0 . 0 1 %w/v gelatin, 0.05% each Tween20 
and NP40 detergents, pH8 . 3 at 25°C) using 2 units Amplitaq 
enzyme and a Hybaid Intelligent Heating Block on mode 2 
(plate) control. Denaturing steps were at 99°C for 45 
seconds with an extended time of 2 minutes for the first 
cycle. Extension reactions were done at 74 °C for 
durations of 2 minutes for the first ten cycles, 2.5 
minutes for the second ten cycles and 3 minutes for any 
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further cycles. Annealing steps were of 2 minute duration 
at 58°C for oligos 596/789 and 52°C for oligos 790/996. 

The PCR products were separated on a 1-3% neutral agarose 
gel. Major bands were excised and recovered by agarose 
digestion, phenol/chlorf orm extraction and ethanol 
precipitation. These products were then cloned following 
EcoR I /Pst I digestion into the pBluscribe plasmid vector, 
and plated to give product libraries F1 - F5. 

To analyse the products of this experiment between 2 and 6 
clones were picked at random from each product library. 
These were examined by cross-bybridisation . In the case 
of product library F3 two distinct isolates were thus 
obtained* These were used as probes upon library F3 
enabling 14 non-hybridising colonies to be located. From 
there a further 3 distinct products were subsequently 
identified. These results are summarised in Table 1. 

All distinct products were used to probe Southern blots of 
EcoRI digested human, mouse and 1W1 DNA in the absence of 
competitor DNA. In all cases the probes had clearly been 
derived from human sequences present within 1W1 . In the 
majoriy of cases a single hybridising band was detected 
with only 1 case showing hybridisation to a high copy 
number repeat element. By sequencing each product it was 
observed that a number of isolates shared a similar 
sequence even though all had given unique single bands 
upon Southern analysis. One such product was therefore 
used as a probe on genomic blots under various 
stringencies. This family of products were thus shown to 
have been derived from a low copy repeat element. 
Sequence database searches were also performed for each 
coincident sequence cloning product. A summary of these 
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results is presented in Table 2, in which "Line 1" is 
human Kpnl repeat (reference 2). "L1HEG" is a sequence 
within a repeat rich region at the beta-globin cluster 
(reference 3), and "P450d 7/SigmaG3" is a sequence 
upstream of both P450c17 (reference 4) and immunoglobulin 
heavy C gamma 3 (reference 5) genes. Examples of Southern 
blot results are shown in Figure 2. Washing stringencies 
for each experiment are given in the Figure. 

All of the products obtained were foun to be genuinely 
coincident indicating a specificity of 100%. Furthermore 
given the proportion of the human genome present within 
1W1 C5%) and the number of mixture A molecules examined 
(1,000) then the eight products isolated indicate a 
minimum recovery efficiency of 16%. The spectrum of the 
coincident species obtained probably reflects their 
abundance and/or ease of amplification by PCR. 

Oligonucleotide sequences 

5* capture oligo: 

GGACGGGTCGACACGCGAGGAGCCAAGCTTGCATGCCTGCA 



3* capture oligo: 

AATTCGTAATCATGGTCATAGAGCACCCGTGCTACCGGAACG 



485: 

TG ATT AC G A ATTGG TG C AGG C ATG C AAG 
PCR oligos: 

596 GGACGGGTCGACACGCGAGG 

789 CGTTCCGGTAGCACGGG 

790 GCCAAGCTTGCATGCCTG 
996 GCTCTATGACCATGATTACG 
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PRODUCT NUMBER OF DISTINCT 

LIBRARY CLONES EXAMINED PRODUCTS 




Table 1 CSC products examined and their 

product library of origin. 
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Claims 

1 . A method of recovering from two mixtures of DNA those 
sequences present in both mixtures, comprising: treating 
DNA in a first mixture in a manner comprising the use of 
restriction enzyme (s) to produce single stranded DNA 
fragments with added defined flanking sequences flanking 
the single stranded DNA fragments; annealing capture 
oligonucleotides to the added defined flanking sequences 
of the resulting single stranded DNA; treating DNA in a 
second mixture using the same restriction enzyme (s) to 
produce single stranded fragments; combining the products 
obtained frrom the first and second mixtures and allowing 
the single stranded fragments to anneal; joining annealed 
single stranded DNA from the second mixture present as a 
heteroduplex to the capture oligonucleotides of DNA from 
the first mixture; and recovering from the resulting 
mixture sequences captured in the form of hetroduplex 
coincident DNA including the capture oligonucleotides* 

2. A method according to claim 1, wherein DNA from the 
first mixture is produced in single stranded form by 
cloning in M13. 

3. A method according to claim 1, wherein DNA from the 
first mixture is produced in single stranded form by PCR. 

4. A method according to any one of the preceding claims f 
wherein the capture oligonucleotides are in the range of 
30-40 bases long. 

5. A method according to any one of the preceding claims , 
wherein an oligonucleotide complementary to the capture 
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oligonucleotides is added to the products of the first DNA 
mixture before combining the products of the first and 
second DNA mixtures. 

6. A method according to any one of the preceding claims, 
wherein heteroduplex coincident DNA is recovered from a 
mixture by means of PCR. 

1. A method according to any one of the preceding claims, 
wherein the products of the second DNA mixture possess 
100% sequence hormology with the products of the first DNA 
mixture to which they anneal. 

8. A method according to any one of the preceding claims, 
wherein the coincident sequence identified by the method 
is a DNA sequence which is highly conserved between 
distinct species ♦ 

9. A method according to any one of claims 1 to 7 , 
wherein the coincident sequence identified by the method 
is a DNA sequence which is invariant between unrelated 
individuals of the same species • 
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MIXTURE A 

RE1 RE2 



1/2 



RE1 


RE2 







MIXTURE B 
RE1 RE2 



RE1 



RE2 



DOUBLE DIGESTION 
CONVERT TO SINGLE STRANDED 
VERSION WITH DEFINED ENDS 



DOUBLE DIGESTION 
DENATURE 



ANNEAL CAPTURE OLIGOS 



ANNEAL 



NONCOINCIDENT DNA HETERODUPLEX NONCOINCIDENT 
FROM A COINCIDENT DNA DNA FROM B 

I 

LIGATE. ISOLATE AND 
PCR AMPLIFY HETERODUPLEX 





0.1 x SSC, 
0.1%SDS, 
65°C. 



6 x SSC, 
0.1%SDS, 
65°C. 



0.1 x SSC, 
0.1%SDS, 
65°C. 
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